Phonetic feature extraction for context-sensitive glottal source processing

نویسندگان

  • John Kane
  • Matthew P. Aylett
  • Irena Yanushevskaya
  • Christer Gobl
چکیده

The effectiveness of glottal source analysis is known to be dependent on the phonetic properties of its concomitant supraglottal features. Phonetic classes like nasals and fricatives are particularly problematic. Their acoustic characteristics, including zeros in the vocal tract spectrum and aperiodic noise, can have a negative effect on glottal inverse filtering, a necessary pre-requisite to glottal source analysis. In this paper, we first describe and evaluate a set of binary feature extractors, for phonetic classes with relevance for glottal source analysis. As voice quality classification is typically achieved using feature data derived by glottal source analysis, we then investigate the effect of removing data from certain detected phonetic regions on the classification accuracy. For the phonetic feature extraction, classification algorithms based on Artificial Neural Networks (ANNs), Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs) are compared. Experiments demonstrate that the discriminative classifiers (i.e. ANNs and SVMs) in general give better results compared with the generative learning algorithm (i.e. GMMs). This accuracy generally decreases according to the sparseness of the feature (e.g., accuracy is lower for nasals compared to syllabic regions). We find best classification of voice quality when just using glottal source parameter data derived within detected syllabic regions. 2013 Published by Elsevier B.V.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using phonetic feature extraction to determine optimal speech regions for maximising the effectiveness of glottal source analysis

Parameterisation of the glottal source has become increasingly useful for speech technology. For many applications it may be desirable to restrict the glottal source feature data to only speech regions where it can be reliably extracted. In this paper we exploit the previously proposed set of binary phonetic feature extractors to help determine optimal regions for glottal source analysis. Besid...

متن کامل

Physiologically-motivated Feature Extraction Methods for Speaker Recognition

PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION METHODS FOR SPEAKER RECOGNITION Jianglin Wang, B.S., M.S. Marquette University, 2013 Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overal...

متن کامل

Robust LP analysis using glottal source HMM with application to high-pitched and noise corrupted speech

This paper presents a robust feature extraction method effective to speech signal with high fundamental frequency and/or corrupted by additive white noise. The method represents the glottal source wave using HMM in order to model the nonstationary properties. The nodes of HMM are concatenated in a ring state to represent the periodicity of voiced sounds. The method can accurately extract glotta...

متن کامل

Zeros of the z-transform (ZZT) representation and chirp group delay processing for the analysis of source and filter characteristics of speech signals

This study proposes a new spectral representation called the Zeros of Z-Transform (ZZT), which is an all-zero representation of the z-transform of the signal. In addition, new chirp group delay processing techniques are developed for analysis of resonances of a signal. The combination of the ZZT representation with the chirp group delay processing algorithms provides a useful domain to study re...

متن کامل

Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface

Recent improvements are presented for phonetic decoding of continuous-speech from ultrasound and optical observations of the tongue and lips in a silent speech interface application. In a new approach to this critical step, the visual streams are modeled by context-dependent multi-stream Hidden Markov Models (CD-MSHMM). Results are compared to a baseline system using context-independent modelin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2014